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Abstract 

Random walks are gaining much attention from the networks research community. They 
are the basis of many proposals aimed to solve a variety of network-related problems such 
as resource location, network construction, nodes sampling, etc. This interest on random 
walks is justified by their inherent properties. They are very simple to implement as nodes 
only require local information to take routing decisions. Also, random walks demand little 
processing power and bandwidth. Besides, they are very resilient to changes on the network 
topology. 

Here, we quantify the effectiveness of random walks as a search mechanism in one-hop 
replication networks: networks where each node knows its neighbors' identity/resources, 
and so it can reply to queries on their behalf. Our model focuses on estimating the expected 
average search time of the random walk by applying network queuing theory. To do this, 
we must provide first the expected average search length. This is computed by means of 
estimations of the expected average coverage at each step of the random walk. This model 
takes into account the revisiting effect: the fact that, as the random walk progresses, the 
probability of amving to nodes already visited increases, which impacts on how the net- 
work coverage evolves. That is, we do not model the coverage as a memoryless process. 
Furthermore, we conduct a series of simulations to evaluate, in practice, the above men- 
tioned metrics. Our results show a very close correlation between the analytical and the 
experimental results. 
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length 
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1 Introduction 



Random walks are a mechanism to route messages through a network. At each 
hop of the random walk, the node holding the message forwards it to some neigh- 
bor chosen uniformly at random. Random walks have interesting properties: they 
produce little overhead and network nodes require only local information to route 
messages. In turn, this makes random walks resilient to changes on the network 
structure. Thanks to these features, random walks are useful for different applica- 
tions, like routing, searching, sampling and self-stabilization in diverse distributed 
systems such as Peer-to-Peer (P2P) and wireless networks [1-10]. 

Past works have addressed the study of random walks. Some of this research has 
focused on the coverage problem, trying to find bounds for the expected number of 
hops taken by a random walk to visit all vertices (nodes) in a graphQ G (Co) [11- 
14]. Results vary from the optimal Co of complete graphs Q(n log n) [11] (where n 
is the number of vertices) to the worst case found in the lollipop graph Q{n^) [15]. 
Barnes and Feige in [16] generalize this bound to the expected number of hops to 
cover a fraction (/ < n) of the vertices of the network, which they found is 0(/^). 
Other works, for example, are devoted to find bounds on the expected number of 
steps before a given node j is visited starting from node i {Hj j). For example, it is 
known that the upper bound for Hi j is Q(n^) [17]. Many of these results are based 
on the study of the properties of the transition matrix P and adjacency matrix A in 
spectral form [18]. 

The previous results are used in several works to discuss the properties of random 
walks in communication networks. Gkantsidis et al. [19] apply them to argue that 
random walks can simulate random sampling on P2P networks, a property that in 
their opinion justifies the 'success of the random walk method' when proposed as 
a search tool [3] or as a network constructing method [9]. Adamic et al. [20] study 
the search process by random walks in power-law networks applying the generating 
function formalism. This work seems deeply inspired by a previous contribution of 
Newman et al. [21], who study the properties (mean component size, giant compo- 
nent size, etc.) of random graphs with arbitrary degree distribution. 

This paper introduces a study of random walks from a different perspective. It does 
not study the formal bounds in the amount of hops to cover the network. Instead, it 
tries to estimate the efficiency of the random walk as a search mechanism in com- 
munications networks, applying network queuing theory. It takes into account the 
bounded processing capacities of the nodes of the network and the load introduced 
by the search messages, that are routed using random walks. To obtain this load, we 

' The term time to refer to the number of hops of the random walk (that is, its length) 
is usual in many previous works. Thus, for example, Co is often denoted the cover time. 
However, in this work we will use the term time to refer to the duration of the random walk. 
To avoid confusion, from now on the term time will only denote the physical magnitude. 
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need to estimate first the average search length, which in turn is computed from the 
expected average coverage: the average number of diff"erent nodes covered at each 
hop of the random walk. A distinguishing feature of our work is that, as in the case 
of Adamic et al. [20], it deals with a scenario that has not been very exhaustively 
explored although, in our opinion, is quite interesting in the communications field: 
one-hop replication networks. 



One-hop Replication One-hop replication networks (also called lookahead net- 
works [22]) are networks where each node knows the identity of its neighbors and 
so it can reply on their behalf. Hence, to find a certain node by a random walk it 
suffices to visit any of its neighbors. This feature is present for example in social 
networks, where to find some person it is usually enough to locate any of her/his 
friends [20]. Also, certain proposals to improve the resource location process on 
P2P systems [2, 23] (some based on random walks) assume that each node knows 
the resources held by its neighbors, so to discover some resource (such as a file or 
a service) it suflices to visit any of the neighbors of the node(s) holding it. 

In one-hop replication networks, when the random walk visits some node i we say 
it also discovers the neighbors of i. Hence, we will use two different terms to refer 
to the coverage of the random walk. We denote by visited nodes those that have 
been traversed by the random walk, and by covered nodes the visited nodes and 
their neighbors. See Figure 1 for an illustrative example. 



Previous Work and the Revisiting Effect There is some research work related 
with the characterization of random walks in one-hop replication networks. In [24] 
the authors prove that in the power-law random graph the amount of hops for a 
random walk to discover the graph is sublinear (faster than coupon collection, with 
which the random walk is compared in [19]). Also, Manku et al. [22] study the 
impact of lookahead on P2P systems where searches are routed through greedy 
mechanisms. In another work, Adamic et al. [20] try to find analytical expressions 
for Ccthe cover time of a random walk in power-law networks with two-hops repli- 
cation. They detected divergences between the analytical predictions and the ex- 
perimental results. The reason for such discrepancy, as the authors point out, is the 
revisiting effect, which occurs when a node is visited more than once. In small- 
world networks, where a small number of nodes are connected to other nodes far 
more often than the rest, it is quite common for random walks to visit often these 
highly connected nodes. 



Our Contributions Although there is a plethora of interesting results about ran- 
dom walks, we have noticed that there are situations where current findings are 
not straightforward to apply, especially on communication networks with one-hop 
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replication. For example, in such networks, we can be interested on studying be- 
forehand the expected behavior of the random walk to evaluate if it suits the system 
requirements. We characterize the random walk performance by four values: 

• The expected coverage. Given by the expected number of visited and covered 
nodes of each degree k at each hop / of the random walk. 

• The expected average search length. Expected length of searches in number of 
hops, assuming that the source and destination nodes of each search are chosen 
uniformly at random. Obtained from the coverage estimations. 

• The expected average search duration. Expected time to solve searches. Ob- 
tained from the average search length, given the processing capacity of each 
node and the load on the network due to queries. 

• The maximum load that can be injected to the network without overloading it. 

In this work we provide a set of expressions that model the behavior of the ran- 
dom walk and give estimations for the three previous parameters. Our claim is that 
these expressions can be used as a mathematical tool to predict how random walks 
will perform on networks of arbitrary degree distribution. Then, we do not only 
address the coverage problem (i.e. to estimate the amount of nodes covered after 
each hop of the random walk), but we also apply queuing theory to model the re- 
sponse time of the system depending on the load. As we show, this approach allows 
to compute in advance important magnitudes, such the expected search duration or 
the maximum load that can be managed by the network before getting overloaded. 
Additionally, we find our model useful to study how certain features of the network 
impact on the performance of searches. For example we find that the best average 
search time is achieved only if the nodes with higher degrees have also greater 
processing capacities. 

The expressions related with the estimation of covered nodes at each hop are the 
most complex part of the model. They must deal both with the one-hop replication 
feature and the revisiting effect. However, we should remark that the model can be 
trivially adapted to networks where the one-hop replication property does not hold, 
and the search finishes only when the node we are searching for is found (see the 
last paragraph in Section 2.4). 

Likewise, it is easy to modify the model to a variation of the random walk where 
each node avoids sending back the message to the node it received it from at the 
previous hop. We denote this routing mechanism avoiding random walks, and we 
deem it interesting for two reasons. First, intuitively, it should improve the random 
walk coverage (we have confirmed this experimentally). Second, it can be imple- 
mented in real systems using only local information, just as the pure random walk 
(the sending node only needs to know from which neighbor the message came 
from). 
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O Visited Nodes 
^ + (3 Covered Nodes 

Fig. 1. Illustrative example of visited and covered nodes 

A feature of our proposal is that it does not require the complete adjacency matrix 
A, that in some situations could be unknown. Instead, thanks to the randomness 
assumption we apply it only needs the degree distribution of the network to com- 
pute the metrics we are interested in. On the other hand, this work is focused on 
networks with good connectivity and where the nodes degrees are independent (see 
Section 2.1). 

Another property of this model is that it takes into account the revisiting effect by 
modeling the coverage of the random walk at each hop / depending on the coverage 
at the previous hop I - I. That is, the evolution of the coverage is not assumed to be 
a memoryless process, a simplification that can lead to errors as seen in [20]. 

The rest of the paper is organized as follows. Section 2 introduces our analysis of 
the coverage and average search length of random walks, along with some exper- 
imental evaluation. Section 3 is centered on obtaining the average search time of 
random walks. Finally, in Section 4, we state our conclusions and propose some 
potential future work. 



2 Analysis of Random Walks 

In this section, we analyze the behavior of random walks in arbitrary networks. 
2.1 Model and Assumptions 

We will represent networks by means of undirected graphs G = (V, E), where ver- 
tices V represent the nodes and edges E QVxV are the links between nodes. There 
are no links connecting a vertex to itself, or multiple edges between the same two 
vertices. This does not simplify our model, but makes it closer to real scenarios like 
typical P2P networks. We denote by |y| = n the number of nodes in the graph and 
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by Hk the number of nodes that have degree k (i.e., the number of nodes that have 
k neighbors, = 2|£'|). For all vertices its degree k is lower than the size of the 

network n, as in typical real world networks (such as social and pure P2P networks) 
each node is connected to only a subset of the other vertices in the systerrH]. We 
also denote by pk the probability that some node in the network, chosen uniformly 
at random, has degree k (i.e., pk = nt/n). The average degree of a network is given 
by ^ = Z/t ^ Pk- For a given network, the distribution formed by the probabilities pk 
(for all k) is known as the degree distribution of such a network. 

A random walk over G can be defined as a Markov Chain [15] process Mq where 
the transition matrix P = [Pij] is defined as: 

(tK if(iJ)eE. 

p.. — J d(i) ^ J' 

1 otherwise. 



where P^j is the probability of moving from node i to node j, and d{i) is the degree 
of node /. P allows to study the probability of visiting each node at each hop /. 
This probability is expressed in the state probability vector, = {q\,q^^, ...,q\^, 
where q\ represents the probability that the random walk visits node / at hop /. This 
probability evolves as = q'^^P. 

Assuming that G is connected and finite, then is irreducible: any node can be 
reached from any other node, and the average path length between two any nodes is 
finite. Assuming also that G is non-bipartite, then we can state that Mq is aperiodic 
and so we are able to apply the Fundamental Theorem of Markov Chains [15]. 
This theorem states that in such graph Mc is ergodic an exists an unique state 
probability distribution n, denoted the stationary distribution, such that nP = n, 
7T = (7Ti,7T2, 7r„), whcrc TTj is: 



d(i) 

TTi = . (2) 

2\E\ ^ ' 

Intuitively, n represents the steady state of Mq. That is, tt,- represents the proba- 
bility that the node i is visited at any hop of the random walk once the stationary 
distribution has been reached. This probability is proportional to the degree of i, 
d(i). 



^ Some P2P networks like Napster have a central node that network members use to lo- 
cate files. But those networks ai^e not considered as pure P2P systems because they use a 
typical server-client architecture with a centralized topology to perform searches. They are 
regarded to have a "P2P" behavior only in the way files are shared. This work is rather 
focused on the decentralized topologies of pure P2P networks 
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Mixing Rate and Conductance We are interested on how fast the random walk 
converges to n, a magnitude that is called the mixing rate [18]. We require a fast 
convergence in order to be able to apply Equation 6. 

The convergence rate is related with the eigenvalues of the transition matrix P. A 
vector X is an eigenvector of P with eigenvalue /I iff xP = Ax (so for example n 
is an eigenvector of P with eigenvalue 1). It is well known [18] that P has n real 
eigenvalues Aq = I > A\ > ... > An~i > -1 (and in fact, if G is non-bipartite 
then An-i > -1). It is also known [25] that the convergence rate to n is governed 
by the second largest eigenvalue modulus of P, max{/li, |/l„-i|}. In most real world 
networks we can safely assume that Ai > \A„^i \ [18, 19,25]. The following holds for 
a random walk starting at node i [18]: 



where P\ is the distribution of the state of the random walk at hop /, when / is the 
initial state. Thus, we can expect a fast mixing for high values of the spectral gap 
l-Ai. 

Now, the Ai value is strongly related with the conductance of the network, Oq. In- 
formally, the conductance measures how well 'connected' the graph is. It is defined 
as follows. For S Q V, the cutset of 5, C(5), is the set of edges with one endpoint 
in S and the other endpoint in 5. The volume of 5, vol(5), is defined as the sum 
of degrees of the nodes in 5, i.e., vol(5) = Xies Then the conductance of G is 
computed as: 



Oc= min (4) 
scv vol(5) ^ 

vol(5)<vol(y)/2 ^ 

The relationship between the conductance and the convergence is given by the fo- 
llowing expression {Cheeger's inequality) [18]: 

-^<l-Ai< 20g. (5) 



So a good conductance leads to high mixing rates, that is, the random walk state 
will converge quickly to the stationary distribution n. The intuition behind this fact 
is that in graphs with good conductance the random walk will be able to move to any 
region of the graph easily, whichever the origin node, and so it will evolve quickly 
to the equilibrium. We reason that high connectivity is to be expected in many real 
world networks (specially communication networks) and network models [26-28]. 
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Therefore, we can assume that the probability that the node visited by the random 
walk has degree k at each hop of the random walk, P{k), is also proportional to k 
and can be computed as: 



We will apply Equation 6 intensively for our analysis of the coverage. Of course, 
its correctness depends on the distance of the random walk to the stationary distri- 
bution, or how fast it converges to it. Another issue to be taken into account is the 
possible dependencies between successive steps of the random walk. Our analysis 
estimates the average number of nodes visited and covered by the random walk at 
a certain hop from the values estimated at the previous hop. The new estimation is 
done assuming that the random walk has statistical properties similar to the random 
sampling of nodes where the probability of choosing a certain node is proportional 
to ki, despite the apparent dependencies between consecutive hops. 

Also, the work by Gkantsidis et al. [19] shows the similarities between independent 
sampling and random walks, that we assume for our mean based analysis. As the 
authors state, in networks with good connectivity and expansion properties (which 
are strongly related to Ai) the random walk has a behavior close to independent 
sampling, being the probability of choosing some node proportional to its degree. 

Besides, we have performed some experiments to verify the correctness of this 
hypothesis. The results, shown in Figure 2 confirm it is a valid assumption. Also, we 
would like to remark that the property expressed by Equation 6 is in fact assumed 
in previous works about random walks (e.g., [20,21]) and backed by [19]. 

Another important issue we have tested is how 'fast' the random walk evolves to a 
state where the assumption of Eq. 6 holds. Figure 3 shows how the random behaves. 
It can be seen that, almost immediately after hop (start node), the probability of 
reaching a node of degree k is P{k). 

We should note that the good conductance property, that implies that the random 
walk can move from any node to any other node in few steps, discards some topolo- 
gies such as cycles. 



Independence of Nodes Degrees Finally, we assume that the degrees of neigh- 
bors are independent. That is, given any two connected nodes / and j ((z, j) e E) 
and any two degree values ki and k2, then P[d{i)=ki \ d(j)=k2] = P[d(i)=ki] = pt, . 
This property holds in networks built by random mechanisms, like the ones used 
to built the ER and small- world networks we target in our experiments. To confirm 
that the degree independence assumption is valid we have run some experiments. 




(6) 



d{i)=k 
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(a) Erdos-Renyi networks. 



(b) Small-world networks. 



Fig. 2. In these figures, we show the probability of a seaixh message aiTiving at a particular 
node as a function of its degree. We have used both Erdos-Renyi and small-world (pow- 
er-law) networks formed by 50,000 nodes, with different average node degrees (10, 20 
and 30). The same experiments have been performed with networks formed by 25, 000 and 
100, 000 nodes, and we found similar results. As it can be readily seen, the probability of a 
search message arriving at a particular- node is proportional to the degree of the node. 



P(I5) - 
P(15/hopl 

P(20) - 
P(20/hop) 

P<25) - 
P(25/!iopj 



PiI5) - 
P(15/hopi 

P(20l - 
P(20p<^) 

PtiO/hopi 



(a) Erdos-Renyi network, k - 30. 



(b) SmaU- world network, ^ = 10. 



Fig. 3. These figures compare the probability P{k) of reaching a node of degree k as defined 
by the model, with the measured probability of reaching a node of degree k at each hop 
of the random walk. Both for ER and small-world networks the experimental results are 
averaged over three different networks with the same average degree and size {n = 50-10^). 



whose results are shown in Figure 4. These experiments aim to measure if the prob- 
ability of reaching a node of degree k when following a random walk is affected by 
the degree k' of the node the random walk was in the previous hop (P(k/k')). Our 
results lead to the conclusion that Vfc, k'P(k/k') = P{k), that is, k' does not have an 
impact on k. 



P(20l - 
P(20/k'j 
PISO) - 

pfsa/k') 
P(40> - 

p(4m') 



PIW) - 
P(10/k') 

Pt20) - 
P(20/k') 

PiBO) - 
P(SM-) 

P(40) - 
Pi40/k'] 



Degree of node the rw 



I n r-nJP 



(a) Erdos-Renyi network, k - 30. 



(b) Small- world network, ^ = 10. 



Fig. 4. These figures compare the probability P{k) of reaching a node of degree k as defined 
by the model, with the measured probability of reaching a node of degree k given that 
the rw comes from a node of degree k' , P{k/k'). Both for ER and small-world networks 
the experimental results are averaged over three different networks with the same average 
degree and size {n = 10^). 

We should note also that this property is not fulfilled in certain graphs like those 
built by preferential mechanisms where it is well-known that there is a correlation 
among neighbors degrees [29]. This could lead to certain deviations in mean-based 
analysis of the random walk (as our own). 

In the following, we study how many different nodes are visited by a random walk 
as a function of its length (i.e., of the number of steps taken) and of the degree dis- 
tribution of the chosen network. Subsequently, we extend this result to also consider 
the neighbors of the visited node. These metrics allow us to quantify how much of 
a network is being "known" throughout a random walk progress. Then, we turn our 
attention to provide an estimation of the average search length of a random walk. 
In the last subsection, we validate our analytical results by means of simulations. 
We assume that only the degree distribution pi, and the size n = |y| of the network 
are known. 



2. 2 Number of Visited Nodes 



This metric represents the average number of different nodes that are visited by a 
random walk until hop / (inclusive), denoted by V'. Note that nodes may each be 
visited more than once, but revisits are not counted. 

To obtain V', we first calculate the average number of different nodes of degree k 
that are visited by a random walk until hop / (inclusive), denoted by Vj^. We make a 
case analysis: 

• When 1 = (i.e., in the source node): Since the source node of the random walk 
is chosen uniformly at random, then the probability of starting a random walk at 
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a node of degree kis pu. Therefore, 



y° = 1 ■ + ■ (1 - p,) = p,. (7) 

• When 1=1 (i.e., at the first hop): Here we apply that the probability of visiting 
some node of degree k at any hop is given by P{k) (Equation 6). This is based on 
the assumption that the random walk behaves similarly to independent sampling 
despite dependencies between consecutive hops (based on [19], see Section 2.1). 
We deem this premise to be reasonable even at the first stages of the random 
walk, due to the high mixing rates found in the type of networks on which we 
focus our work (again, see Section 2.1). Recall that the experimental evaluation 
both of this assumption (Fig. 2) and of our model (shown in Section 2.5), seem 
to verify this. Thus, we have that 



Vl = Vl + P{k) 

k pk (8) 

= Pk + 

k 

• When I > 1: we must take into account the probability of the random walk 
arriving at an already visited node. To compute such a probability, we define the 
following two values: 

■ Pv{k, I): This represents the probability that, if the random walk arrives at a 
node of degree k at hop /, that node has been visited before. It can be obtained 
as follows: 

yI-2 

P,{kJ) = (9) 
nk 

Note that we put Vl~^ instead of V'f^ because the node visited at hop I - I 
can not be visited at hop / (no vertex is connected to itself). 
• Ph'. This is the probability that at anvgiven hop the random walk is moving 
back to the node where it came froml2j. Since any visited node has degree k 
with probability P(k), then the random walk will go back through the same 
link from which it came with probability 1 /k. Therefore, we have: 




Using these probabilities, Vl can be written as 



V'k = Vi''+P(k)(l-Pt)(l-P,.(k,l)) 

_ -^-^('-f)(-¥)- 

^ Here we can easily adapt the model to the avoiding random walk. If we don't want to 
consider the case of a random walk moving back to the node where it came from, it is 
enough to assign Pt = 0. 
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Finally, taking the results obtained in Equations 7, 8 and 1 1, we have that the total 
number of different nodes visited until hop I is 

V = J] Vl (12) 

2. 3 Number of Covered Nodes 

This metric provides an estimation of the average number of different nodes covered 
by a random walk until hop / (inclusive), denoted by C'. A node is covered by a 
random walk if such a node, or any of its neighbors, has been visited by the random 
walk. 

To obtain C', we first calculate the number of different nodes of degree k covered 
at hop /, denoted by C[. 

• When / = 0: 

ci = p,(i+kP(k)) + Y,Pjjp(k) 

j*k (13) 

= Vl + P{k)k. 

The first term takes into account the possibility that the source node has de- 
gree k. The second term refers to the number of neighboring nodes (of the source 
node) of degree k. If the source node has degree j (which happens with probabil- 
ity pj) then, on average, j P{k) nodes of degree k will be covered, since each one 
of the j neighboring nodes of the source node will have degree k with probability 
P{k). 

• When / > 0: Given a link (i>, w) e E, we say that it has two endpoints, which are 
the two ends of the link. We denote the endpoint of the link at node v by v{w), 
and similarly the endpoint of the link at node «; by «; (y). We say that v (w) hooks 
onto node v. We also say that v (w) has been checked by a random walk if such 
a random walk has visited node w. These concepts are graphically explained in 
Fig. 5. 

Now, let us denote by the number of endpoints checked for the first time at 
hop /, and by Pu{k, I) the probability that these endpoints hook onto still uncov- 
ered nodes of degree k. Then, C[ (where / > 0) can be written as follows: 

C[ = C[-' + Pu(k,l)EK (14) 

• To obtain E', we consider the number of different endpoints checked after hop / 
to be jy'j- So, the number of endpoints checked for the first time at hop / is 
2y(Vj - V'j~^) j. However, one of the endpoints hooks onto the node the random 
walk comes from (i.e., it cannot increase the amount of nodes that are covered). 
Thus: 
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Fig. 5. The figure shows a simple graph formed by 5 nodes (named a, b, c, d and e) where 
there is a random walk that follows the path d - b - c - e. At each node, we represent the dif- 
ferent "endpoints" that are hooked on that node by means of small circles. For instance, the 
endpoints a{b) and a{c) are said to be hooked onto node a. In the graph, when the random 
walk starts (at node d), then endpoint b{d) is said to be checked. Similarly, when it visits 
node b, then endpoints d{b), a{b) and c{b) are said to be checked. The same mechanism 
applies when the random walk visits nodes c and e. 



(15) 



To obtain Puik, I), on one hand we consider the overall number of endpoints 
hooking onto uncovered nodes of degree k just before hop / is k{nk - C'^^). 
On the other hand, the overall number of endpoints is 2; j nj, and the overall 
number of checked endpoints until hop / - 1 (inclusive) is Yuj j ^j^- That is, 
the number of endpoints not checked just before hop / is Yij j ~ j ^j^- 
Therefore, we can write: 



k (nt - C[-') 



Substituting Equation 15 and 16 into Equation 14, we have that 



(16) 



C[ =C 



+ 



k{nu-C'-') 



X 



Xi(^j-Vj-i)(7-l). (17) 



Finally, taking into account Equations 13 and 17, we have that the total number of 
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(a) Erdos-Renyi network. 



(b) Small-world network. 



Fig. 6. In the Erdos-Renyi network most nodes have approximately the same number of 
links. In contrast, the small-world network is heterogeneous: the majority of the nodes have 
approximately the same number of links but a few nodes have a large number of them. 

nodes covered after hop / is 

C' = (18) 



2.4 Average Search Length 



Using the previous metric, we are now able to provide an estimation of the average 
search length of random walks, denoted by /. Formally, I is given by the following 
expression: 

CO 

l = Y,iPf(il (19) 

/=o 

where Pf{l) is the probability that the search finishes at hop / (i.e., the probability 
that the search is successful at hop /, having failed during the previous I - I hops). 
Let us define the probability of success at hop /, denoted by PsQ), as the probability 
of finding, at that hop, the node we are searching for. Ps(l) can be obtained as the 
relation between the number of new nodes that will be covered at hop /, and the 
number of nodes that are still uncovered at hop /. That is, 



n- C ^ 



Now, Pf(l) can be obtained as follows: 



i=0 



Pfd) = P.(l)Y](^-Ps(i)) = • (21) 

Therefore, / can be written as 
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(22) 



2.5 Experimental Evaluation 



We have run a set of experiments to evaluate the accuracy of the expressions pre- 
sented in the previous subsections. The results obtained are presented in this sec- 
tion. 

For our work, we consider two kinds of network: small-world networks (con- 
structed as in [21]) and Erdos-Renyi networks (constructed as in [30]). 

• Small-world networks [2 1 , 3 1 ]. In [32] it is shown that many real world networks 
present an interesting feature: each node can be reached from any other node in 
few hops. These networks are typically denoted small-world networks. The In- 
ternet, the Web, the Science collaboration graph, etc. are examples of real world 
networks that are consistent with this property. This kind of networks are also 
specially interesting for our work because here the revisiting effect commented 
in Section 1 is strongly present due to the uneven degree distribution. We build 
small- world networks using the mechanism described in [21], which leads to 
networks whose degree distribution follows a power-law distribution ~ k~°' 
(power-law networks). 

• Erdos-Renyi (ER) random networks [30]. For two any nodes i,j 6 V there is a 
constant probability c that they are connected ((/, j) e E). The resulting degree 
distribution is a binomial distribution pi, ~ (".)c''(l - c)""*". 



See Figure 6 for an illustrative example of both kinds of networks. 
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Number of Visited and Covered Nodes Our first goal is to study the evolution 
of the network coverage by random walks in real networks. 

The experiments were run on networks of two sizes, n = 5 • 10"^ and n = 10^ nodes. 
Networks were built using three different average degrees: k = 10, = 20 and k = 
30. In each network we ran lO'^ random walks of length n = \V\. The source node 
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of each random walk was chosen uniformly at random. From the experiments, we 
obtained the average number of visited and covered nodes for each degree k at each 
hop /. Finally, for each network, we extracted its degree distribution nj, and apply 
the expressions described in the previous section to get a prediction of those values, 
given by V[ and C[. Results are shown in Figures 7, 8, 9, and 10. For the sake of 
clarity, the experimental results are shown every 2000 hops in all figures. Model 
predictions, on the other hand, are drawn as lines. 

Figure 7(a) shows the evolution of the number of visited nodes in ER and small- 
world networks of size n = 5-10'^ nodes, with two different average degrees k = \0 
and k = 30. We see that, although the length of the random walks is enough to 
potentially include all the nodes, only a fraction of them are visited. This happens 
because of the revisiting effect, and it is more evident when the number of hops 
increases, since the probability of revisiting grows with the number of hops. The 
revisiting effect is stronger in small-world networks than in random networks. The 
reason is the uneven distribution of the nodes degrees: there are some nodes with 
a very high degree that will be visited once and again by the random walk. Thus, 
the chances of finding new nodes at each hop are lowered faster in small-world 
networks than in ER networks. Also, we observe in Figure 7(a) that in networks of 
smaller k the revisiting effect is stronger. Finally, Figure 7(b) shows the impact of 
the network size n on the amount of visited nodes. As expected, a greater n implies 
a lesser number of revisits for the same number of hops. In all cases, the prediction 
of the total amount of different nodes visited is very close to the experimental 
results. 

In Figure 8 we study the accuracy of the predictions of the amount of visited nodes 
of a particular degree k at each hop /, V[. We draw the results and predictions of 
degrees k = k + 5 and k = k - 5, fox k = I0,k = 20 and k = 30. Again, it can be 
seen that the model predictions fit very well with the experimental results, despite 
the revisits and the different behavior observed for different degrees. 

Figure 9 gives the results of the experiments run to study the coverage of the random 
walk. Figure 9(a) shows how the coverage grows faster in small-world networks 
than in ER networks for networks of the same average degree k. This contrasts 
with the amount of visited nodes, that behave in the opposite way (see previous 
paragraphs). The reason is the presence of well-connected nodes, that are quickly 
visited during the first hops of the random walk and increase considerably the cov- 
erage because of the high amount of neighbors they have. For example, after 4000 
hops, the random walk has covered about half of the small-world network with 
k = 10, while in the ER network of the same k the random walk only has covered 
close to 30% of the nodes. Moreover, we can see that the network average degree 
has also an important impact on the coverage. In both kind of networks the cover- 
age grows faster when the average degree is higher. Besides, we observe that the 
difference of the coverage for both networks decreases more quickly for a higher k. 
Figure 9(a) confirms the importance of the average degree, comparing the results 
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Fig. 1 1. Avoiding Random Walk, Visited and Covered Nodes V and C . 

for networks of dilFerent size and k. In addition, Figure 9(b) compares the results of 
the coverage for ER networks of different sizes and average degrees. As it could be 
expected, the networks of smaller size require less hops to be covered. We observe 
also that the average degree has an important influence on the coverage difference. 
The greater the average degree, the faster the coverage of both networks converges. 
In all cases, the C' values given by the model predict very well how the coverage 
behaves and evolves. 

Figure 10 allows to check the precision of the coverage predictions for different k 
values, C[. As before, the values provided are very close to the experimental results, 
although the behavior of the coverage changes strongly depending on the kind of 
network and average degree. 

Finally, we check the model accuracy for random walks that avoid the previous 
node, the avoiding random walk. As stated in Section 2.2, the avoiding random 
walk can be easily implemented by our model just by setting Pt = (see Equa- 
tion 10). Results are shown in Figure 11. There we compare the coverage of pure 
and avoiding random walks in ER and small- world networks of size n = 10^ nodes 
and average degree ^ = 10. Figure 11(a) confirms that, as expected, the avoiding 
random walk is able to visit a greater number of different nodes, as the revisiting 
effect is, to a certain degree, lessened. However, Figure 11(b) shows that this has 
little impact on the network coverage. We find that there is only a small increase 
on the amount of covered nodes when using avoiding random walks, for both kind 
of networks. Nonetheless, in all cases the V' and C' values given by the model are 
very close to real results. 



Average Search Length For the experiments regarding the average search length 
we used networks whose sizes ranged from 10"^ to 2 • 10^ nodes. In each experiment 
we ran 10"^ searches, averaging the obtained results. At each search, two nodes (one 
corresponding to the source and the other to the destination) were chosen uniformly 
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Fig. 12. Average Search Length /. 

at random. Starting from the source, a random walk traversed the network until the 
destination node was found (i.e., a neighbor of the destination is visited). 

The first thing to note is that the average search length grows linearly with the 
network size in both ER and small-world networks. Besides, the average degree k 
has an important effect on the results. The bigger the k, the shortest the searches are. 
The reason is that a higher k implies that at each hop more nodes of the network are 
discovered. Also, it can be observed in Figure 12 that the average search length is 
greater in ER networks than in small-world networks. This can be explained if we 
take into account that random walks, on average, cover more nodes in small- world 
networks than in ER networks (see Figures 9). 

As in the previous experiments. Figure 12 also shows that our experimental results 
regarding the average search length correspond very close to the analytical results 
that were obtained. 

At this point, we would like to note that, given the assumptions we made in our 
analytical model, it seems that the very good match achieved with the experimental 
results could only occur if these assumptions are correct. As a matter of fact, we 
have verified, in practice (see Figs. 2 and 4), that the type of networks we consider 
in this paper, indeed, fulfill our assumptions. 

On the other hand, it is clear that if we take into account networks that do not fulfill 
some of our assumptions, then a certain mismatch should be expected. For instance, 
networks built by preferential mechanisms are known not to preserve the indepen- 
dence of degrees of neighbors [29]. Therefore, we should not aim for a very close 
correspondence between analytical and experimental results. We have performed 
the same experiments we ran for random and small-world networks regarding the 
average search length, but this time with networks built using the preferential at- 
tachment mechanism proposed by Barabasi [31]. Now, we have observed that, as 
expected, in preferential networks our experimental results do not correspond very 
close to the analytical results (see Fig. 13(a)). Instead, the model seems to be con- 
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Fig. 13. Average Search Length /, not pure random networks, 
sistently pessimistic. Also, the error continuously grows with the network size. 

Finally, we have tested the model against Toroidal networks of different average 
degrees k = 10 (5 dimensions) and k = 16 (8 dimensions). Our intention is to ana- 
lyze networks which are not random at all. Results, which are shown in Fig. 13(b), 
show a very clear mismatch among the results predicted by the model and the actual 
performance of the random walk. 



3 Duration of Searches by Random Walks 



In this section, we present the second part of our model. Here we provide useful 
expressions that allow to predict the performance of random walks as a search tool, 
which is the main goal of this work. These expressions rely on the same estimation 
of the average search length (like the one described in the previous section), that is 
combined with Queuing Theory [33]. As a result, given the processing capacities 
and degrees of nodes, we are able to compute two key values: 

• The load limit: the searches rate limit that the network can handle before satura- 
tion. 

• The average search time: the average time it takes to complete a search, given 
the global load. 

Also, we show how these expressions can be used to analyze which features a net- 
work should have so random walks have a better performance (i.e., searches are 
solved in less time). In particular, we focus on studying the relationship between 
degree and capacity distributions, showing that the minimum search time is ob- 
tained when nodes of higher capacities are also those of higher degrees. 

In our analysis, networks are assumed to be Jackson networks [33]: the arrival of 
new searches into the network follows a Poisson distribution and the service at each 
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node is a Poisson process. 



3. 1 Searches Length and Load on Nodes 



Our first step is to set the relationship between the average searches length and the 
system load. Each search is processed, on average, 1 + / times (once at the source 
node, and once at each step of the random walk). Using this, we can express the 
total load on all the nodes of the system. A, as 

A = {I +1)7, (23) 



where y is the load injected in the system by new searches, that we assume to 
be known. Note that A is composed of the new generated searches (y), plus the 
searches that move from one node to another, denoted by y'. Hence, 

y' + y y' 
I = - 1 = ^. (24) 

7 7 



To compute the load on each particular node j, Aj, let us take into account that the 
probability that a random walk visits a node is proportional to the node's degree 
(see Section 2). This implies that, for each node j e V, the load on node j due to 
search messages, denoted y'j, is proportional to its degree kj. As a result, we have 
that there is a value r such that y'. = t kj, for all /. Hence, y' = T,, y'- = t d, where 
d is the sum of all degrees in the network (i.e., d = Y^k^kk). Therefore, 



ly_ 
d ■ 



(25) 



Assuming that all nodes generate approximately the same number of new searches 
(y/n), we can compute the average load at node j as 



A: = T + — 
n 




(26) 



where the first term represents the load due to search messages, and the second 
term to the searches generated at node j. Note that any other search generation rate 
model can be implemented just by changing the term y/n. 
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3.2 Average Search Duration 



In order to obtain the average search duration, T,., we use Little's Law [33], which 
states that 



r = y X Tr, (27) 



where r is the average number of resident searches in the network (i.e., searches 
that are waiting or being served), and y is the average number of searches generated 
per unit of time (i.e., the arrival rate of searches). Observe that y is assumed to be 
known. Hence, the challenge to compute Ty is to obtain r. Let rj be the number of 
resident searches in node j. Then, r = 2; O- 



To obtain rj, we apply Little's Law again, this time individually to each node j: 



rj = Aj X r/, (28) 



where is the average search time at node j and Aj is the average load at node j, 
which includes both searches generated at node j and searches due to messages 
from other nodes. Next we use that, by Jackson's Theorem [34] (recall we assume 
the network to be a Jackson network), each node j can be analyzed as a single 
M/M/1 queue with Poisson arrival rate Aj and exponentially distributed service time 
with mean Ti (which can be computed from the node capacity, that we assume to 
be known). Then: 



Ti = (29) 
1 -pj 



where pj is the utilization rate and Ti is the average service time at node j. As 
Pj = Aj Ti, we can write 



T^ 

Ti = (30) 

l-A,Ti 
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Once we have Aj and T/, we can combine them to obtain 



r 

T = - 

7 



7 



j 

7y\-Xj Ti 



J l-T 
( 



E 



n d 



ji(kjln + d) 



7 



(31) 



That is, we have provided an expression that computes the average search time 
using the topology, the average service times of nodes, and the search arrival rate. 



3.3 Load Limit 



Implicitly, in our previous results it has been assumed that no node is overloaded 
(i.e., Aj < 1 ITl for all j). Otherwise, the network would never reach a stable state. 
Thus, a key value for any network is its load limit: the minimum search arrival rate 
(y) that would overload the network, denoted by y^. Clearly, = min^jyo) being 
yi the minimum search arrival rate that would overload node j. 



From Equation 26, we have that 



Aj = k/-^ + l. (32) 
d n 



Also, since no node must be overloaded, it must be satisfied that 



Xj < —r (33) 



Combining Equation 32 with Equation 33 we have that, for each j, the following 
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Fig. 14. Average Search Times. For the analytical values (Tr), we used Equation 31, 
taking into account that follows an exponential distribution with average Aj (i.e., 
Ti ~ Exponential(/lj)), where Aj can be computed as the relation between the number 
of resources known and their processing capacity. 

must hold: 

7 < — -. • (34) 

Ti {kj In + d) 

Therefore, the load limit for node j is 

d n 

7i = . (35) 

Ti (kj In + d) 



and 

• f dn ] 

7o = min<^— \. (36) 

Ti(kjln + d)} 



J 



3.4 Experimental Evaluation 
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Average Search Duration In this subsection, we present the results of a set of 
experiments addressed to evaluate, in practice, the accuracy of our model for the 
average search time. As in the previous experiments (Section 2.5), we conducted 
extensive simulations over ER and small-world networks. All networks are made 
up of 10"^ nodes. 

In each experiment, nodes generate new searches following a Poisson process with 
rate y/n, where y is the global load on the network. When a node starts a search for 
a resource, it first checks whether it already knows that resource (i.e., if the node 
itself or any of its neighbors hold the resource). If so, the search ends successfully. 
Otherwise, a search message for the requested resource is created and sent to some 
neighbor node chosen uniformly at random. When a node receives a search mes- 
sage, it also verifies whether it knows the resource. If so, the search is finished. 
Otherwise, the search is again forwarded to another neighbor chosen uniformly at 
random. The experimental results are obtained by averaging the results that were 
obtained. 

We used six different global loads (y): 0.15xy„, 0.3 xy,,, 0.45 xy^, 0.6xyo, 0.75 xy^ 
and 0.9 x y„, where y,, is the minimum arrival rate that would overload the network 
(see Section 3.3). The distribution of the nodes search processing capacities c; is 
derived from the measured bandwidth distributions of Gnutella [35] (see Table 1). 
Capacities are assigned so that nodes with a higher degree are given a higher ca- 
pacity. All nodes are assumed to have the same number of resources w = 10, 000. 
Each resource is held by one node, and all resources have the same probability of 
being chosen for search. The processing time at each node / follows an exponential 
distribution with an average service time computed as T[ = w kj/ci. This average 
is computed dividing the amount of resources checked for each search (the total 
amount of resources known, w{k + I), minus the resources of the node the search 
message came from, w) by the node's capacity. 

For each load, we measured the average search times experimentally for each net- 
work. Results are shown in Fig. 14. It can be seen that, as expected, the average 
search time always increases with the load, undergoing a higher growth when it 
approaches the maximum arrival rate. Furthermore, our experimental results show 
a very close correspondence with the analytical results that were obtained. 



Load Limit We have computed the y^ values for random and small- world net- 
works with different average degrees. For each kind of network and average degree 
five networks were built with the capacity distribution presented in Table 1 . Our 
goal was to observe the variation of the y^ for networks of the same type and k, and 
also to study the difference among the y„ values depending on the network kind and 
average degree. 

Results, which are shown in Figure 15, differ for random and small- world networks. 
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The first thing to note is that small-world networks can handle a greater load than 
random networks. 

Small-world networks present variations of the y„ values even for networks of the 
same average degree. Despite this variation, it is clear that the load limit tends to 
grow with the k. The reason is that a greater k implies a smaller global load for 
the same rate of queries injected to the system. Recall that the total load is given 
by (1 -I- l)y (Equation 23) and that higher average degrees lead to lesser average 
searches lengths / (Figure 12(b)). Hence, it is possible to perform more queries 
before overloading the network. 

Erdos-Renyi networks however behave in a very different manner. They present 
very little variations of the 7„ values. And, more surprising, there is a small decrease 
of the load limit when the k grows. This contrasts with the behavior of small-world 
networks. As it is shown in Figure 12(a), larger average degrees imply smaller 
average searches lengths and so a smaller global load. However, the that can be 
handled by the network does not change accordingly to this. The reason seems to be 
that in ER networks the load is more evenly distributed among nodes. This implies 
that low capacity nodes have to handle an important amount of searches. Besides, 
a greater average degree impacts on the average services times of these nodes, 
as they know, and so they have to process, more resources per search. Hence, these 
nodes keep being the bottleneck of the network despite the smaller average search 
length, preventing the system to be able to handle a greater load. 

However, it is important to recall that these results are also due to the capacity distri- 
bution used, and how it was distributed among the nodes. In small-world networks, 
if we assign low capacities to high degree nodes we can expect them to become 
bottlenecks of the network that force small jo values. In ER networks, adding more 
high capacity nodes could change the jo tendency so it would grow with the average 
degree. Exploring all these phenomena is beyond the scope of this paper. 



3.5 Optimal Relationship between Degree and Capacity Distributions 

In this section we show that, when there is a full correlation between the capacity 
of a node (i.e., the number of searches a node can process per time unit) and its 
degree, this leads to a minimal value of the average search time T,-. 

Let us first state the relation we assume between the capacity Cj and the average 
service time Ti of a node j. We assume that the first is a parameter that does not 
depend on the degree or the number of resources known by the node, and only de- 
pends on the processor and network connection speeds. We assume that the second 
is a strictly increasing function of the node's degree f{kj). We assume that a node's 
service time is directly proportional to its degree and inversely proportional to its 
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capacity as follows: 



= 

s 



(37) 



Let us now consider a pair of nodes /, j 6 V, such that kj > kj (so fikj) > f(kj)), 
and two possible positive capacities c\ and C2, such that ci > C2- We show that, if 
no other degree or capacity assignment changes, having cy = C\ and c, = C2 gives 
a smaller average search time, T,., than the average search time T'^. with reverse 
assignment Cy = C2 and c\ = ci. 

Using Eq. 37, we obtain the following possible average service times: 

Ci C2 C2 Ci 

in which i are the service times obtained with the first capacity assignment and 
Ts,i are the service times obtained with the second. From the above equations, we 
have 



y7 y 



ci C2 (39) 

i,2 ^ s,2' 



and 



- n,2 = /(^O 



< fikj) ^ (40) 

Cl C2 



= T\-T\. 
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Let Ai and Aj be the loads on / and j. Since k, < kj, then A, < Aj. Hence, from this 
and Eq. 40, we find that 



To compute the values Tr and T'^, we use Eq. 31 



(41) 



r,- + r 



r = - 

r 

y 



(42) 



(43) 



where r; and ry are obtained with the first capacity assignment and r^' and r'. with 
the second. Observe that r/, remains the same for any node h that is neither i nor 
J, because its degree, load, and capacity are just the same for both cases. Hence, if 
r; + r, < + r'. then r, < r'. 



From Eqs. 28 and 30, we obtain that 



r,- + r,- = 



+ 



1 - Ti, 



1 + ^^ -^iniT^u-^^^^li+^i^^)' 



and 



r- + r,- 



s,2 



+ 



Aj TK 



1 - A J Ti, 



1 - A, Tl, 

-2 Aj + + ^y r , 



(44) 



(45) 



Finally, applying Eqs. 39 and 41, we conclude that 

n + rj < r'. + r'j. 



and hence 



Tr < T' 



(46) 
(47) 



This proves that, for a given degree distribution, the best performance will be ob- 
tained by assigning the largest capacities to the nodes with the largest degrees. Note 
that we have found a condition that is necessary in order to attain the minimum 
possible T,-, once the degree distribution has been set. However, different degree 
distributions can obtain very difi'erent T,. values. 
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4 Conclusions 



In this paper, we have presented an analytical model that allows us to predict the 
behavior of random walks. Furthermore, we have also performed some experiments 
that confirm the correctness of our expressions. 

Some work can be carried out to complement our results. For instance, several 
random walks can be used at the same time, a situation that could be used to further 
improve the efliciency of the search mechanism. These random walks could run 
independently or, in order to cover separated regions on the graphs, coordinate 
among them in some way. 
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